NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ON THE ELIGIBILITY OF LLMS FOR COUNTERFACTUAL REASONING: A DECOMPOSITIONAL STUDY

Yang, S; Yang, Q; Tang, L; Meng, Y; Guo, N; Blackburn, J; Xi, Z (January 2026, ICLR Conference)

Full Text Available
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

Yang, S; Yang, Q; Tang, L; Meng, Y; Guo, N; Blackburn, J; Xi, Z (January 2026, ICLR 2026)

Full Text Available
How does training shape the Riemannian geometry of neural network representations?

Zavatone-Veth, JA; Yang, S; Rubinfien, J A; Pehlevan, C (September 2025, NeurIPS 2025 Workshop on Symmetry and Geometry in Neural Representations)

Full Text Available
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Li, J; Yang, Y; Yang, S; Zhang, L; Wu, YN (September 2025, ACL (Association for Computational Linguistics))

Full Text Available
Convex Relaxation for Solving Large-Margin Classifiers in Hyperbolic Space

Yang, S; Liu, P; Pehlevan, C (April 2025, Transactions on machine learning research)

Full Text Available
No Preference Left Behind: Group Distributional Preference Optimization

Yao, B; Cai, Z; Chuang, Y S; Yang, S; Jiang, M; Yang, D; Hu, J (April 2025, The Thirteenth International Conference on Learning Representations.)

Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to reflect human preferences, they struggle to capture the distributional pluralistic preferences within a group. These methods often skew toward dominant preferences, overlooking the diversity of opinions, especially when conflicting preferences arise. To address this issue, we propose Group Distributional Preference Optimization (GDPO), a novel framework that aligns language models with the distribution of preferences within a group by incorporating the concept of beliefs that shape individual preferences. GDPO calibrates a language model using statistical estimation of the group's belief distribution and aligns the model with belief-conditioned preferences, offering a more inclusive alignment framework than traditional methods. In experiments using both synthetic controllable opinion generation and real-world movie review datasets, we show that DPO fails to align with the targeted belief distributions, while GDPO consistently reduces this alignment gap during training. Additionally, our evaluation metrics demonstrate that GDPO outperforms existing approaches in aligning with group distributional preferences, marking a significant advance in pluralistic alignment.
more » « less
Full Text Available
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Cen, S; Mei, J; Goshvadi, K; Dai, H; Yang, T; Yang, S; Schuurmans, D; Chi, Y; Dai, B (April 2025, The Thirteenth International Conference on Learning Representations)

Full Text Available
FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation on GPUs

Wang, D; Kim, D; Jin, B; Zhao, X; Fu, T; Yang, S Y; Liu, X (December 2024, arXiv preprints)

Finetuned large language models (LLMs) have shown remarkable performance in financial tasks, such as sentiment analysis and information retrieval. Due to privacy concerns, finetuning and deploying financial LLMs (FinLLMs) locally are crucial for institutions and individuals. In this paper, we employ quantized low-rank adaptation (QLoRA) to finetune FinLLMs, which leverage low-rank structure and quantization technique to significantly reduce computational requirements while maintaining model performance. We also employ data and pipeline parallelism to enable local finetuning on commodity GPUs. Experiments on financial datasets validate the efficacy of our approach in yielding notable improvements over the base models.
more » « less
Full Text Available
Early Lightcurves of Type Ia Supernovae Are Consistent with Nondegenerate Progenitor Companions

https://doi.org/10.3847/1538-4357/ae058b

Burke, J; Andrews, M; Howell, D A; Sand, D J; Amaro, R C; Brown, P J; Andrews, J E; Bostroem, K A; Dong, Y; Haislip, J; et al (November 2025, The Astrophysical Journal)

Abstract If Type Ia supernovae (SNe Ia) result from a white dwarf being ignited by Roche-lobe overflow from a nondegenerate companion, then as the SN explosion runs into the companion star its ejecta will be shocked, causing an early blue excess in the lightcurve. A handful of these excesses have been found in single-object studies, but inferences about the population of SNe Ia as a whole have been limited because of the rarity of multiwavelength follow-up within days of explosion. Here we present a 3 yr investigation yielding a nearly unbiased sample of nine nearby (z < 0.01) SNe Ia with exemplary early data. The data are multiwavelength, coveringUBVgriand Neil Gehrels Swift Observatory UV bandpasses, and also early, with an average first epoch 16.0 days before maximum light. Of the nine objects, three show early blue excesses. We do not find enough statistical evidence to reject the null hypothesis that SNe Ia predominantly arise from Roche-lobe-overflowing single-degenerate systems (p= 0.94). When looking at the objects’ colors, we find the objects are almost uniformly near-UV–blue, in contrast to earlier literature samples which found that only a third of SNe Ia are near-UV–blue, and we find a seemingly continuous range ofB − Vcolors in the days after explosion, again in contrast with earlier claims in the literature. This study highlights the importance of early, multiwavelength, high-cadence data in determining the progenitor systems of SNe Ia and in revealing their diverse early behavior.
more » « less
Full Text Available
Calibrated Self-Rewarding Vision Language Models

Zhou, Y; Fan, Z; Cheng, D; Yang, S; Chen, Z; Cui, C; Wang, X; Li, Y; Zhang, L; Yao, H (December 2024, NeurIPS)

Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches are resource-intensive and may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR significantly enhances performance and reduces hallucinations across twelve benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning.
more » « less
Full Text Available

« Prev Next »

Search for: All records